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[57] ABSTRACT 

A computer system for user speech actuation of access to 
stored information, the system including a central process- 
ing unit, a memory and a user input/output interface includ- 
ing a microphone for input of user speech utterances and 
audible sound signal processing circuitry, and a file system 
for accessing and storing information in the memory of the 
computer A speech recognition processor operating on the 
computer system recognizes words based on the input 
speech utterances of the user in accordance with a set of 
language/acoustic model and speech recognition search 
parameters. Software running on the CPU scans a document 
accessed by a web browser to form a web triggered word set 
from a selected subset of information in the document. The 
language/acoustic model and speech recognition search 
parameters are modified dynamically using the web trig- 
gered word set, and used by the speech recogmtion proces- 
sor for generating a word string for input to the browser to 
initiate a change in the information accessed. 

12 Claims, 2 Drawing Sheets 
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WEB TRIGGERED WORD SET BOOSTING 
FOR SPEECH INTERFACES TO THE 
WORLD WIDE WEB 

BACKGROUND OF THE INVENTION 

A portioa of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent 
disclosure, as it appears in the Patent and Trademark Office 
patent file or records, but otherwise reserves all copyright 
rights whatsoever. 

This invention relates to a system and method for access- 
ing information via the internet, and more particularly to a 
speech interface to information resources on the World Wide 
Web (WWW). 

A dominant theme of many new companies has been 
software related to information access on the internet, as 
evidenced by immense posturing and positioning of almost 
all major computer companies in the arena of internet. One 
of the critical components of this explosion in interest 
directly relates to the interaction of the user with the internet 
information resources. A few approaches including web 
surfing using browsers, web crawlers and search engines as 
the most common means of accessing this internet- wide 
information infrastructure. 

The increase in the number of users, the number of 
servers, the volume of data transferred and the increasing 
advertising dollars spent on the internet will only fuel larger 
growth of the internet. It is hoped that this road will 
eventually lead to a society in which internet access by 
common-folk at homes, becomes as pervasive and easy to 
use as a TV or telephone. A number of technologies are 
already gaining momentum in making the dream of provid- 
ing internet access at homes a reality. However, one of the 
real problems in using this technology is the not-so-seamless 
nature of the computer, and perhaps more importantly, the 
human-computer interface. One essential component to 
make internet access as seamless and as easy to use as a 
TV-remote is the capability to use speech commands to 
navigate, query and search the cyberspace. If nothing else, 
this capability will at least eliminate the need to use a 
keyboard/mouse interface. 

Very recently, a few companies have actively tried to 
provide quick speech access to the web, with mixed success. 
Building a speech recognition system as an interface to the 
World Wide Web is a very different problem from those 
previously encountered in other speech recognition 
domains, such as read or spontaneous speech. The primary 
problem is huge vocabularies: the user can virtually access 
any document on the internet, about any topic. One way of 
reducing the vocabulary problem is by having "smart speech 
recognizers" (such as the one described in this document). 
The Out-Of- Vocabulary (OOV) problem is another serious 
difficulty in using speech interfaces for web navigation. 
Lack of sequential word context is yet another important 
issue where traditional n-gram language models fail. 
Issues in Web Surfing with Speech 

Described below are some of the most common issues 
related to having a general speech interface to the WWW: 
Out-Of- Vocabulary Words 
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Properly dealing with unknown words is an unsolved 
problem in continuous speech recognition. Even with very 
large vocabularies, a proper method of handling out-of- 
vocabulary (OOV) words is not available. In the context of 

s web surfing, it becomes even more prominent. This is due to 
the increased use of abbreviations, short-forms, proper 
names, and virtually unlimited vocabulary access. One 
simple solution is to cater the links (i.e. by renaming) to 
speech recognition engine vocabularies. Even so, speech 
recognition accuracies with state-of-art technologies are not 
good enough, mainly due to the absence of (language 
modeling) context in link names (or link referents), for use 
with very large vocabulary systems. It is necessary to get the 

15 in-vocabulary words correct even in the presence of OOV 
words. 

On-the-fly Pronunciation Generation 

Unlike continuous speech recognition systems, there is 
accessible information about the vocabulary for applications 

20 such as web surfing; for instance, the web links that are 
being spoken are accessible through the contents of the web 
page currently being viewed Such information can be 
utilized to improve speech recognition performance. Since 

25 one has access to the list of words that are present in the 
presently viewed page, the system can determine which 
words are OOV words. It is important to note that the letter 
spelling of (some of) the OOV words are available: using 
letter to phoneme mappings, the phonemic transcription of 

30 such OOV words can be generated, and utilized in decoding 
the uttered speech signal. This is commonly referred to as 
on-the-fly pronunciation generation, and still an area of 
active research. 
Language Models 

The use of language models has profoundly improved the 
performance of continuous speech recognition systems on 
various tasks. However, language modeling is a very diffi- 
cult issue for web surfing. For certain problems which 

40 require interactive dialogue with the system, traditional 
n-gram language models may help: however, in many cases, 
the speakable links are short sequences of words that do not 
frequently occur in a large textual corpus (including web 
pages) that is used to make the language models. Static 

45 language modeling is a weak component for web surfing 
applications of speech recognition systems. Issues relating 
to dynamic language models such as on-the-fly grammar 
generators are discussed in the Summary of the Invention 

so below - 

Prior Art Attempts at Speech-Actuated Web Access Enu- 
merating Links 

A simple method of addressing the problem of web access 
using speech is to enumerate all the links in a page, and have 

55 the user read out the number of the link in order to access it. 
This is the approach taken in the beta version of Kolvox's 
Voice companion to the internet. Clearly, this is an unnatural 
way of accessing the web, and fails for html documents such 
as forms, and search engines. 

60 Dynamic RGDAG 

Texas Instruments is developing a speech interface for the 
WWW, using a Directed Acyclic Graph (DAG) of probabi- 
listic regular grammars (RGs) to constrain the recognition 

65 search. See Charles T Hemphill and Philip R. Thrift, "Surf- 
ing the Web by Voice", in Proceedings of the ACM, Multi- 
Media 3 95. Whenever a new page is accessed, the corre- 
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sponding RGDAG is created and used for constraining the boosting the word recognition search. The links are not 

speech recognition search. This is extended to the notion of limited to hypertext links, however, and the word set can be 

smart pages which contain a reference to a grammar that augmented in several ways: by adding pre-specified function 

interprets the result of that grammar. and command words; by adding text words from the current 

RGDAGs are unfavorable for use in speech recognition 5 page and other text words such as keyword identified in the 

systems for several reasons. First, building RGDAGS document; and the like. 

dynamically involves a lot of computation overhead. The actual modification of the language model and acous- 

Second, it is well-known that restarts, and mid-utterance tic model scores can be implemented in a variety of 

corrections are a common feature of spontaneous speech, 10 methods, and the formulation presented in the Detailed 

and are very difficult, if not impossible to model correctly Description is general and includes methods such as word 

with Directed Acyclic Graphs (DAG) of probabilistic regu- se t probability boosting, and dynamically altering language 

lar grammars (RGs). Thus, if word repetitions are not model probabilities using word set information, 

explicitly modeled, then the TI system would fail for the ^ foregoing and other objectS) features and advanta ges 

following utterance: 1 of ^ mventioQ become more readily apparent from the 

GOTO THE <pause> THE C S HOME PAGE following detailed description of a preferred embodiment of 

The link in the above example is GOTO THE C S HOME mc invention which proceeds with reference to the accom- 

PAGE. panying drawings. 

The second example where the IT system fails in its 2 q 

present form is when known words are spoken, although BRIEF DESCRIPTION OF THE DRAWINGS 

they are not present in the link referent. Assuming a static FIG. 1 is a diagram of an overview of the Web trigger 

lexicon which includes the word CITY, and assuming that concept for speech interfaces to the World-Wide Web 

the link word sequence is exactly the SAN FRANCISCO according to the invention. 

MAP, then a system based on RGDAG would fail (due to the ™ nQ 2 fe a bbck di&&&m Qf a system implementing the 

insertion of CITY) when the user utters the following: invention of FIG 1 

THE SAN FRANCISCO CITY MAP FIG. 3 is a flowchart of the speech recognition process 

When out-of-vocabulary (OOV) words are embedded in emp]oyed m me ^ of FIGS t and 2 

known words, the RGDAG approach typically fails to 30 a DDcxirtiv i ■ i- *• t r * t a e 

* | * r» a -c*i_ APPENDIX 1 is a listing of pertinent portions of code for 

recover the correct m-vocabulary words. For instance, if the , . , TTm „,., „ . . 

, nrnnr . , . - , , , ., , , a web-tnggered HTML link extractor for use in implement- 
word PRICE is absent from the vocabulary, and the only link ... 

in the page is called THE CURRENT STOCK QUOTES, mg the preSCnt mventl0n - 

then the RGDAG method would fail for the following APPENDIX 2 is a listing of pertinent portions of code for 

utterance* 35 mo ^fi cat i° ns to a conventional speech recognition program 

TOE CURRENT STOCK PRICE QUOTES t0 integrate web - tri SS ered word set boostin S iato *° rec °g" 

It is infeasible to build RGDAGs characterizing very large mtlon P rocess * 

vocabulary spontaneous speech. Accordingly, a need APPENDIX 3 is a listing of a Web browser display 

remains for a better form of speech interface to a computer, 40 u P datin g program in accordance with invention, 

in particular, one that is adapted to the broad vocabulary DETAILED DESCRIPTION 

encountered on the World Wide Web, but able to respond to \y eD Triggers Concept 

short speakable sequences of words not necessarily found in Based on the above discussions, it should be clear that a 

the large textual base of web pages and other Web-accessed mechanism for utilizing very large vocabulary systems, 

documents. 45 wn ile biasing the search towards Web page dependent 

SUMMARY OF THE INVENTION knowledge, is needed. The invention uses a word set mecha- 

_ , . , . . , , nism to incorporate biases towards certain trigger-specific 

Our approach is to employ statistical n-gram based gram- information 

mars which are in vogue. Typically n-grams allow the notion a *• j i « * j 

. 7 . r ' r . rn Acoustic and language scores are usually computed sepa- 

of looser constraints be incorporated into the search process 3U . , - . 7 ... 4 ; It 

., ~ ^ x-/\ „ „„ .„ , rately in most speech recognition systems. Language models 

when contrasted with RGDAGs. While n-grams do not , . , «... . . \ ° 

% ...... are derived off-lme usmg a large text corpus, and acoustic 

impose strict word sequence constraints, the problem is that , i 4 . . , , , , . 7 , 

, „ ? „ . , „ , „ models are optimized using lower level criteria (such as 

they are usually statically trained on a fixed set of text u *• * • ** \ r *t_ * 

J J 3 phonetic transcription accuracy). Furthermore, the two 

corpus. 55 SCQres j 5aye tQ ^ e conjoined { n some manner. Bayes rule is 
In the context of speech interfaces to the web, the inven- applied ^ most Kcogaitkin systems to 
tion dynamically makes use of information provided by flQd me m£)St ^ word se^e^e. 
links in a document or in the current page of a source Typically, during the search for the "optimal" word 
document being viewed (and recently viewed pages). To our teqfKOBOf m evaluation function is used in all speech 
knowledge, nobody has proposed the concept of dynami- recognizers . if W is a word sequence, given the acoustic 
cally altering language model scores (such as n-gram) and observation X) men a combined estimate of the "goodness- 
acoustic model scores using a web-triggered word set. of a possib]e WQrd has ^ following general form: 

In the most straightforward implementation, the docu- 
ment is an HTML source document containing hypertext 65 F ( Slm W> s *c (x|w)) (*) 
links, which are used in the present invention to identify the S/j^ is referred to as the language model score, and S^ c 
word set to be included in the final word set for probability is called the acoustic model score. In the context of speech 
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interfaces to the World Wide Web, we redefine the above 
equation as follows: 



F (S^ (W|H, s AC (X|W, H)) 



(2) 



15 



20 



where H is the web triggered word set, F is a function that 
combines the acoustic and language model scores in some 
form. Thus, the set of words that are present in the web page 
(or that specify links) currently being viewed can be treated 
as a word set that is triggered by the web page. The term 
"web triggers" is coined to indicated such a relationship. The 
speech recognition system biases its search towards this set 
of web triggers by modifying the language model and/or 
acoustic model scores using the web triggered word set 
information. This is illustrated in FIG, 1. The web triggered 
word set is in essence a special vocabulary that varies 
dynamically. 

Rather than stress the rigid sequential constraints that 
RGDAGS enforce, the web-triggered word set approach 
enables biasing the search for the "best" word sequence 
process towards the set of words which is dynamically 
produced depending on the web page sources of the web 
pages recently viewed. This word set biasing is achieved by 
incorporating the web-triggered word set as an additional 
piece of information in computing the speech recognition 25 
scores for the different word paths/sequences. 

The actual integration of the dynamic adaptation of the 
web page context information can be performed in many 
ways. One method that was utilized in the current working 
example is word set probability boosting by adapting/ 
modifying word dependant language model weights, as 
described below in the section on Implementation. 

The organization of a system 10 implementing the inven- 
tion can be quite general as depicted in FIG. 2. At one end 
is the user with a local computer/LAN 12 (note that the user 
is symmetric in the sense that he/she could be anywhere in 
the network) and is connected to other remote servers/ 
LANs/Machines/Users 14 through a wide area network 16 
like the INTERNET. The goal of such a connection is to 
maximize information communication between these vari- 
ous users/systems. The invention can also be implemented 
and used within a local area network or just on the user's 
computer. 

In the context of this invention, the user (depicted here at 
the local machine 12 will be using speech input via micro- 
phone and digital signal processing (DSP) as an interface to 
access the information on computers) 14 across the network 
16. The user interacts with the local machine with the same 
interface as always (a netscape/mosaic web browser) 
depicted as WWW Browser 20 in FIG. 1. The only con- 
straint in our implementation is that the client browser 20 
can interpret html documents 22, and can execute java 
applets. In order to move from the current page to a different 
one, the user simply voices the highlighted words of the http 
link of interest. 

For the purpose of the presented algorithm, the actual 
speech recognition process (system 24 in FIG. 1) can occur 
anywhere in the network, so long as it is fast enough and can 
access the digital speech samples of the user's speech and 
can control what information is displayed to the local user. 

In the experiments described below, the speech recogni- 
tion system was running on a SUN SPARC10 workstation 
connected through a gateway server to the INTERNET 
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using Netscape or any other Java-compatible WWW 
browser. The speech recognition software is a University of 
Rochester version of SPHINX-II, a publicly available pro- 
gram written on top of Carnegie -Mellon University's 
Sphinx-II, as described in "An Overview of the SPHINX-II 
Speech Recognition System", Xuedong Huang, Fileno 
Alleva, Mei-Yuh Hwang, and Ronald Rosenfeld, ARPA '93, 
with the addition of the web triggered word boosting in 
accordance with the present invention. The user voice utter- 
ances are input via a conventional microphone and voice 
input and sampling interface to the computer. The CS 
department homepage at University of Rochester and some 
personal home pages were used as example documents that 
could be navigated using speech triggers. Some other ver- 
sions which ran on remote machines were also experimented 
with. 

Description of Speech Recognition Process 
Initialization 

0) Set initial web page context to be last remote/locally 
accessed html page in the web trigger generator of 
Appendix 1, in the speech recognition program mods of 
Appendix 2, and in the web-triggered HTML extractor 
program in Appendix 3. 

Referring to FIG. 3, for each utterance, do the following 
steps: 

1) Process input speech (step 30) using standard prepro- 
cessing and parameterization techniques of speech 
input. 

2) Depending on the source of the currently viewed 
HTML document and selected web -triggered word-set 
list developed in step 32: (NOTE: This assumes that a 
remote/local source has been or can be accessed 
through the network.) 

3) Modify the appropriate language Model and/or acous- 
tic model parameters dynamically in step 34, using the 
selected word-set list (see step 32), to be used during 
the speech recognition search process. See Appendix 2. 

4) Perform the actual speech recognition process (step 36) 
by using the parameters chosen dynamically (step 34) 
depending on the web page context. 

5) Depending on the speech recognition search output 
(step 38), update the information for feedback to the 
user (such as loading a new HTML page) or performing 
an action (such as updating a database, or executing a 
remote/local program) as shown in steps 40 and 42. The 
actual process of updating the Web Browser display is 
shown in Appendix 3. The actual process of updating 
the web -triggered word set is partially done in Appen- 
dix 1 and augmented as desired by the user. 

6) Process next utterance (return to step 1). 

In the current implementation, the web browser 20 loads 
a particular HTML source document 22, which starts up a 
looping JAVA applet (the source of such a sample program 
is given in Appendix 3 and titled Keep Loading, java). What 
this JAVA program does is just to keep accessing a particular 
page which is stored in a file accessible (readable) by both 
the WVW Browser 20 and readable and modifiable by the 
Speech Recognition System 24. This file is typically stored 
on the local computer but can be remote as long as it is 
accessible by both WWW Browser 20 and Speech Recog- 
nition System 24. This file is constantly updated by the 
speech recognition system depending on the recognition 
result of the local users speech. It is constantly polled by the 
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JAVA program listed below. In the current version, the 
speech recognizer is given information such as the list of 
links and corresponding word sequences which it utilizes in 
conjunction with the invention in order to recognize the 
speech, and updates the output HTML document location if 
necessary. 

A simple version of the html document that has to be 
loaded in order to execute the looping JAVA program of 
Appendix 3 is described in the following example. 
Example 

As an example, assume that the user is currently viewing 
the home page of the Univ. of Rochester Computer Science 
Department. Using a simple JAVA program, such as the 
example listed in Appendix 3, the currently viewed page can 
be automatically accessed, and a Link-Referent table created 
as shown (see Table 1 for some examples). 

The information shown in the table was extracted auto- 
matically by a simple parsing JAVA program shown in 
Appendix 1. The set of words constituting the link referent 
can constitute a web triggered word set, and it would make 
sense to bias the speech recognition search towards this set 
of words since it is likely that the user will utter them. This 
web triggered word set can be supplemented with additional 
command words, function words, and even other triggered 
words that are commonly used in conjunction with them 
(e.g., utterance triggers of web triggered words). 

Thus, the source word set of web triggered word set, for 
this example, would consist of {BACK, HOME, C, S, U, 
OF, R, DEPARTMENT, BROCHURE, TECHNICAL, 
REPORT,COLLECT10N, ANONYMOUS, F, T, P, 
ARCHIVE, SUBWAY, MAP, RESEARCH, PROJECTS, 
COURSE, INFORMATION, APPLICATION, 
GRADUATE, STUDY, UNDERGRADUATE, PROGRAM, 
REPORTS, DIRECTIONS, TO }. 

TABUS 1 

link Referents and addresses 



Link 



Referent 



http://www.cs.rochester.eduAi/sarukkai/ 

bttp://www.cs.rochester. 

edu/u/sarukkau/homchtml 

http://www.cs.rochester.edu/ 

http://www.cs.rochester.edu/brochure/ 

http://www.cs.roches ter.edu/trs/ 

f tp://ftp .cs .rochester.edu/pub 
http://www.cs.rochester.edu/subway/ 

http://www.cs.rocb.ester.edu/users/ 
http://www.es .rochester.edu/subway/ 

http://www.cs.rochester.edu/research/ 

http://www.cs.rochester.edu/courses/ 
http://www.cs.rochester.edu/adinit/ 

http://www.cs.rochester.edu/undergrad/ 

http://www.es. rochester.edu/trs/ 
http://www.cs.Tochester.edu/directions/ 



BACK 
HOME 

U OF RCS 

DEPARTMENT BROCHURE 
TECHNICAL REPORT 
COLLECTION 

ANONYMOUS FTP ARCHIVE 
DEPARTMENT SUBWAY 
MAP 

DEPARTMENT PEOPLE 
DEPARTMENT SUBWAY 
MAP 

DEPARTMENT RESEARCH 
PROJECTS 

COURSE INFORMATION 
APPLICATION FOR 
GRADUATE STUDY 
UNDERGRADUATE 
PROGRAM 

TECHNICAL REPORTS 
DIRECTIONS TO URCS 
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Thus, the web-triggered word set H can consist of the 
following: 

A basic set of command and function words chosen a 
priori. This word set is referred to as the basic word set. 

A set of words selectively extracted from the web page 
source that is being currently displayed by the browser. This 
is referred to as the source word set. 



8 

to or ' 



triggered" by the source 
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A set of words "related" 
word set. 

A short-term caching which facilitates keeping around 
recently formed web triggered word sets. 
Implementation 

The actual implementation of equation (2) can be done in 
numerous methods. For example, one method would be to 
apply word set boosting, and boost the language model 
scores for a selective set of words. In this case, the word set 
is a function of the web page being accessed. 

When combining acoustic and language model scores in 
a Bayesian fashion, the language model and acoustic model 
probability products should be viewed as scores rather than 
probabilities. In practice, however, since the sources of 
information are very different, and the true probability 
distributions cannot be accurately estimated, the straightfor- 
ward application of Bayes rule will not lead to a satisfactory 
recognition performance. Therefore, it is common to weight 
the acoustic and language model scores separately so as to 
optimize performance on some held-out training data. The 
language weights may also be tuned using actual acoustic 
and language model scores in a unified stochastic model as 
was demonstrated by "Unified Stochastic Engine (USE) for 
Speech Recognition", Huang, Belin, Alleva, and Hwang, 
Proc. of ICASSF93, pp:636-639. 

Bahl et al. in "Estimating Hidden Markov Model Param- 
eters So As To Maximize Speech Recognition Accuracy", 
Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, and Robert 
L. Mercer, IEEE Trans, on Speech and Audio Processing, 
vol. 1, no. l,pp:77-83, January 1993, have explored the idea 
of estimating Hidden Markov Model parameters so as to 
maximize speech recognition accuracy using an acoustically 
confusable list of words. 

Interpreting the language model weights as "boosting" 
values enables the formulation of utterance specific trigger- 
ing effects in dialogues so as to improve speech recognition 
accuracy. In addition, an adaptive algorithm for tuning these 
word dependent language model weights is given in "Word 
Set Probability Boosting for Improved Spontaneous Dia- 
logue Recognition", Ramesh R. Sarukkai and Dana H. 
Ballard, published as a Technical Report at University of 
Rochester entitled "Word Set Probability Boosting Using 
Utterance and Dialog Triggers for Improved Spontaneous 
Dialogue Recognition: The AB/TAB Algorithms", Ramesh 
R. Sarukkai and Dana H. Ballard, URCSTR 601, Dept. of 
Computer Science, Univ.of Rochester, December 1995 and 
to appear in IEEE Transactions on Speech and Audio 
Processing. 

The word set boosting framework extends the above 
concepts in the following manner. First, let us assume there 
is some a priori information which enables us to predict a set 
of words H for the particular speech utterance in consider- 
ation. The altered B ayes-like scoring function to minimize is 
now 

Pr(X|W)xPi(W)°™* fl < H VO (3) 

There is a special set of Omega(W,H) values for words W 
belonging to the predicted/web-triggered set H, so as to 
improve the scores of such words. There are other values of 
Omega for words not belonging to the set H and thus the 
scores are possibly attenuated. Omega(W,H) is a word- 
dependent factor and essentially the Language model "prob- 
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ability" is raised to the power of Omega(W) in the modified Instances where web triggered word sets help dramati- 

algorithm. The vector Omega is termed as the word boosting cally are letter sequences such as "U of R C S Department", 

vector, and links which include out of vocabulary words. The word 

Now in the context of web triggered word set boosting, boosting approach is able to recover some of the correct 

the idea is to simply assign a value for Omega such that 5 words and thus improve page access performance, 
words that are "predicted/expected" using the Web page To summarize, web-triggers addresses many of the pre- 

context (including the HTML source) are scored better than viously presented issues. These advantages are enumerated 

the rest of the words during the search process, and therefore below. 

are more likely to show up in the best word sequence. Building grammars dynamically involves a lot of com- 

Thus, the language model scores of every word present in putation overhead. The web-trigger approach does not 

the predicted word set, are boosted (improved) by the dynamically vary the vocabularies. The web triggered word 

corresponding factor during the search process. Similarly, set boosting just selectively alters the "scores" that are 

words that are absent from the predicted word set are assigned to the different words, treating the web triggered 

assigned another appropriate Omega(W) (so as to typically 15 wor<i sets diff^^y- 

diminish/attenuate their effective scores). Exact word se q ueoce grammars are too restrictive: for 

The concept of extracting web-triggered word set infer- exam P le > repeating a word would throw the grammar pars- 

mation depending on the context of the web pages recently *g^ ch recognition system off unless the repetition was 

viewed can also be implemented in other methods. One ^afly moddcd. Since the selective web triggered word 

method would be to appropriately smooth/re-estimate 20 &&t P^abuity boosting just biases the speech recognition 

r*> r j ' search towards a specific subset of the vocabulary, word 

n-gram language model scores using the HTML sources of 6nd ^ restarls can be handJed weU Qf ^ , he 

the documents recently viewed. Another would be to appro- web wofd ^ b(J0Stin ch b ^ on Qf 

pnately modify/weight acoustic model parameters. a model> an n ^ ram 

Experimental Result .,„...„. 25 The grammar models degrade ungracefully in the pres- 

Using Camegie-MeUon University's Sphmx-H system as ence of Q0V worfs , n c eyen ^ ^ £ q{ 

the baselme, and applying the word boosting concept using QOV words> since me babilities of words 

web triggers, a series of page access expenments were bdongin tQ ^ web triggcrcd word Kt m boosted me 

pertormed using mostly local home pages, tot speech W eb-triggering approach enables many of the correct words 

recognizer lexicon consisted of 1728 words, and a bigram 30 in » M ,„ , , , 

& , 4 ' , m r « t0 De recovered even when OOV words are present, 

language model was constructed using ATIS and Trains93 rt • • fo _, u . . , A m p i u 

° & & It is lnfeasible to build grammars for very large vocabu- 

corpus. lary spontaneous speech. The alternative language models in 

The link words are accessed using the source of the „ „ - . , t . A , ° . , 

. 4 , t , . , vogue are n-grams, and the web triggered word set approach 

hypertext markup language (HTML) document currently 3S easi i y caters to i ntegr ation with n-grams in order to improve 
being viewed. The speech recognizer then booste the prob- h re itioQ accuracies . 

abihty of these web triggered word sets, while determining t~ , . . . u 

, - j . . .. The concept of the present invention can be enhanced in 

the best word sequence. This word sequence » Unmatched number of Qne wa is tQ i(fe a cacbi ^ 

witt, the stored word sequences corresponding to all the mat ^ ^ me web ^ d WQrd ^ for Kxaa 

links possible from the currently viewed HTML document, « yiewed documents so that ^ ^ can more reat% 

in order to determine the next best link to take. The recog- „ „ * t , e 4U A , „ / 

. j , . , , _ „ « „. « , re-access a selected one of those documents. For example, 

nized words should match at least 30% of the link words u tU A , , c A 4 iU 4 ? 

, . , . « , . . , °y savmg the name and keywords of a document that has 

correctly m order to be vahd. The results shown in the table jus , been accessed ^ ^ can retum , 0 ^ { b ^ 

vario^Tel ranC6S a0CeSSmS ^ 45 " G ° ba0k 10 (DamC ° f d0CUmen ' ° r °' her dcsCTi P tor > " Such 

various pages. a cac jj e wou i(j implemented by saving a number of the 

previous word set lists and assigning a weighting value to 

each according to its recency, or frequency of access, or 

% Correct page access with and other suitable weighting factors. 
without web triraer word se t boosting 50 Having described and illustrated the principles of the 

Method % Correct Page Access invention in a preferred embodiment thereof, it should be 

^ apparent that the invention can be modified in arrangement 

w^T^ered Word Sets 75^93% and delail w^ 1110111 departing from such principles. We claim 

^ — ^— — — ^^^^^^^^^^ 55 all modifications and variation coming within the spirit and 

scope of the following claims. 
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APPENDIX 1: Web-Triggered HTML Link Extractor: JAVA source code 

Brief Description of the following JAVA class LoadPage:: 

The following JAVA program essentially keeps polling the file 
'Tittpy/wwwxs.rochester.edu/u/sarukkai/WordLabeLhtml" which is 
constantly updated by the speech recognition system depending on the 
speech input recognition result. 

If the contents of the above mentioned file has been altered, then 
the JAVA program LoadPage accesses this new URL page (flags an error 
if invalid or not present). Then the appropriate link words (termed 
referants) and the corresponding HTML addresses are extracted from 
the accessed HTML source and stored in local files "page.Address" 
and "page.Referent". The "page Address" and "page.Referent" files are 
in turn accessed by the speech recognition system and used to 
make the word sets which are boosted or scored appropriately. 
NOTE: The following JAVA program also keeps running in addition 
to the speech recognition system and the other looping JAVA program 
(see KeepLoading). 



import java.net.*; 
import java.io.*; 

class LoadPage { 

public static void main(String argsQ) ( 
// Initialize 

DatalnputStream dis=null, currpage=null; 
String thisInputLine,inputLine, 
link= lm , oldlink= ,,M , 

linkAddress= m \ReferenceString= ,, ",HeadAddress= ,m ; 
int startAdd,endAdd; 

URL mypage=null; 

// Keep Looping 
while (true) 
I 

try { 

mypage = new 
URU''http://wv/wxsxochester.edu/u/sa^ 



19 



02/11/2004, EAST Version: 1.4.1 



5,819,220 



13 



14 



catch (MalformedURLException me) { 

System.out.printlnC'MalformedURLException: " + me); ) 

try { currpage = new DatalnputStream(mypage.openStreamO); } 
catch (IOException ioe) { 

System. out.printlnC'IOException: " + ioe); 

) 

try {link = currpage .readLineO; ) 
catch (IOException ioe) ( 

System .out.printlnC'IOException: " + ioe); 

} 

// Check to see if the link address has changed 
// or has been updated by the speech recognition system, 
if (!link.equals(oldlink)) 
{ 

// Acce33 the new HTML document and process to extract 
information 
try{ 

URL yahoo = new URL(link); 

HeadAddress = link.substring(0,link.lastIndexOfT["/' , )+l); 

if (oldlink.equalsC")) oldlink=link; 

FileOutputStream fout= new 
FileOutputStreamC'page.Address"); 

PrintStream myout = new PrintStream(fout); 

FileOutputStream fout2= new 
FileOutputStream("page.Referent ,t ); 

PrintStream myout2 = new PrintStream(fout2); 

myoutprintln(oldlink); 

// Add some simple keywords; can also be appended by 
// speech recognition system 
myout2.println("BACK #"); 

myout.pMtln("http://www.cs.rochester.edu/u/sarukkai/home.htmr'); 
myout2.println("HOME #"); 
myout.println("httpy/wwwxs.rochester.edu/ M ); 
myout2.println( n U of R C S #"); 

dis = new DataInputStream(yahoo.openStream()); 
thisInputLine= M "; 
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while ((inputLine = dis.readLineO) != null) ( 
System .out.println(inputLine); 
if (inputLine.indexOn ,, <area")!=-l) continue; 
if (inputLine.indexOf("<iing n )!=-l) continue; 

// possible link reference 
if (inputLine.indexOfTHREF=\ "")!=- 1) 
{ 

startAdd^nputline.indexOfrHREF^""); 

bnkAddresssinputLine.substrin^startAdd+e^nputLine.indexOfC 1 \ ">" sta 
rtAdd+1)); 

if ( (linkAddress.indexOf(V/ ,, )==-l) && 
(linkAddress.indexOf("mailto: , ')==-l) ) 

linkAddress = HeadAddress.concatGinkAddress); 
System.out.println(linkAddress); 
if (inputLine.indexOfT</A> ,, )!=-l) 

{ 

ReferenceString=inputLine.sub3taing(inputlineJastIndexOf( ,, > , \inputLin 
eJastIndexOf[ ,, </A> ,, )-l)+l,inputLine.indexO«"</A>")); 

System.outprintln(ReferenceString); 

ReferenceString = 
References tring.toUpperCaseO; 

myout.println(linkAddress); 

myout2.println(ReferenceString.concat(" #")); 

linkAddress^'"; 

ReferenceString= ,m ; 

) 

else 

thielnputLine =» thisInputLine.concat(inputline); 
} 

else 

if (inputLine.indexOfI ,, href=\ m, )i=:.l) 
{ 

startAdd=inputUne.indexOfThref= \ ,,n ); 

linkAddres s=inputline .substring(startAdd+6 ,inputline .indexOf[" \ ">" , sta 
rtAdd+1)); 

if ( (linkAddress.indexOf('Vr)==-l) && 
(linkAddress,indexOf[ ,, mailto: ,, )==-l) ) 

linkAddress = HeadAddress.concat(linkAddress); 

System.out.println(linkAddress); 

if (inputLine .indexOfl["</a> , ')!=-l) 

21 
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{ 

ReferenceString^inputLine.substring(in^ 
e.lastIndexO« ,, </a> n )-l)+l f inputLine.indexOf( n </a>")); 
5 System.outprintln(ReferenceString); 

ReferenceString = 
ReferenceString.toUpperCaseO ; 

myout.println(linkAd dress); 

myout2.println(ReferenceString.concat(" #")); 
io linkAddress= ,m ; 

ReferenceString=""; 

} 

else 

thisInputLine = thisInputLine.concat(inputLine); 

15 ) 

else 

if(inputLine.indexOfr</A>")!=-l) 

{thisInputLine = thisInputLine.concat(inputLine); 

20 ReferenceString^thisInputLme.sub^ 

MsInputLine.indexOf[ ,, </A> H )-l)+l,thisInputUne.indexOf( , '</A> ,, )); 

ReferenceString = ReferenceString.toUpperCaseO; 
myout.println(linkAddress); 

myout2.printin(ReferenceString.concat(" #")); 
25 thisInputLine=""; 

linkAddress^'"; 

ReferenceString=""; 

} 

else 

so if (inputLine.indexOf("</a> ,, )!=-l) 

{thisInputLine = thisInputLine.concat(inputLine); 



ReferenceString^thisInputLine.substring(thisInputLine.lastIndexOf(' , > ,, ,t 
35 rdsInputLine.indexOf[ ,, </a>' t )-])^-l t thisInputLine.indexOf[ ,, </a> ,, )); 

ReferenceString = ReferenceString.toUpperCaseO; 
myout.printlnGinkAddress); 

myout2.println(ReferenceString.concat( " #")) ; 
trasInputIine= n ^nnkAddress= ,m ; 
40 ReferenceString=""; 
) 

} 

dis.closeO; 
myout.println( ,, $ u ); 
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myo^.printlnOT); 
myout2.close(); 
myout.closeO; 
} catch (MalformedURLException me) { 

System.out.println( ,, MalformedURLException: " + me); 
} catch (IOException ioe) { 

System.out.printlnC'IOException: " + ioe); 

} 

// update previous link variable "oldlink" 

oldliok=link; 

} 

} 



-END OF APPENDIX 1 • 
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APPENDIX 2: Modifications to the Speech Recognition System 

Brief Description: Modifications done to the components of the speech 
recognition system are indicated below. 

***********************^^ 

Notes: lm_tg_score(wl,w2,w) gives the trigram score. 
Im_bg_score(wl,w) gives the bigram score. 
The following functions boost the scores if 
the word w is in the web triggered word 
set stored in the array thisWordSetH. 

All scores are of some type SCORE (typically 
long int). 

Typical values for the BoostFactorQ 
entries are 0.1-0.5 and such values 
can also be adapted. 



SCORE lm_tg_score_boost (int wl , int w2 , int w3) 
{ 

int indx; 
indx=w3; 

if (indx>=_NUM_ WORDS) indx=-l; 

if (indx==-l) return( (int32) lm_tg_score(wl,w2,w3) ); 

// if word is not in web-triggered word set then return normal score 
if (IthisWordSettindxl) return( (int32) lm_tg_score(wl,w2,w3) ); 

// or else if in web-triggered word set return boosted value 
return ( (int32) BoostFactor[indx]*lm_tg_score(wl,w2,w3) ); 



int32 lm_bg_score_boost (int32 w2 , int32 w3) 
{ 

int indx; 
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indx=w3; 

if (indx>=_NUM_WORDS) indx=-l; 

if (indx==-l) return( (int32) lm_bg_score(w2,w3) ); 

if (!thisWordSet[indx]) return( (int32) lm_bg_score(w2,w3) ); 

//Ifindx is in Web -triggered word set return boosted score. 
return( (int32) BoostFactor[indx]*lm_bg_score(w2 ) w3) ); 

} 



/*** This is typically executed before the search recognition 
process begins for the current utterance ***/ 

// Update the latest word set information 

mamboostJNTETO 

{ 

int i,indx[8]; 
FILE *wordsetfile; 
char ww[401; 

OpenLexiconO; 
Utterances++; 

/* The initialization is done currently by two passes over the 
page.Referent 
File; can be altered to one pass ***/ 

LoadNewPageO; 

/** INITIALIZE EVERYTHING ***/ 
for (i=0;i<_NUM_WORDS;i++) 
{ 

thisLattWordSet[i]=0; 
thisWordSet[i]=0; 

) 

/** Loading Word Set from Dictionary ***/ 
wordsetfile = fopenCpage.Referent'V'r"); 

while ( (fscanfl[wordsetfile, n %s" ( ww)!=EOF) && (strcmp(ww,"$")!=0 ) ) 
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{ 

if (strcmp(ww,"#")==0) continue; 
GetWordIndex(ww,indx); 
for (i=0;i<8;i++) 
e if(indx[i]!=-l) 

if (LexiconLen[indxfi]]>0) 
{ 

thisWordSet [indx[i]] =1 ; 
fprintf(debugfp, M %s ;",Lexicon[indx[i]]); 

io ) 

fprintfldebugfp/'Vn"); 

) 

fclose(wordsetfile); 
J 

15 



/** The following routine is for loading the links and referents 
that are updated in the files page. Referent and page .Address 
by the looping JAVA program (Appendix I) ****/ 

20 

void LoadNewPageO 
{ 

FILE *wordsetfile, *linkFile; 
char ww[20]; 
26 int i j,maxindx,mm; 
float max; 

/** RELOAD NEW BOOK MARKS ***/ 
for (i=0;i<_MAX.LINKS;i++) 

90 { 

strcpy(LinkWords[i] ,""); 
strcpy(LinkAddress[i] ,""); 

} 

i=0; 

35 wordsetfile = fopenC'page.Referent'V'r"); 
linkPile = fopenC'page.Address'V'r"); 

while ( (fscanfiCwordsetfile,"%s",ww)!=EOF) && (strcmp(ww,"$ ,, )!=0 ) ) 
{ 

fscanf(linkFile, M %s M ,LinkAddress [i] ); 
40 strcpy(LinkWords[i] , ,,n ); 

while (strcmp(ww, n #")!=0) 
{ 

strcat(LinkWords[i], M "); 
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for (mm=0;mm<strlen(ww);mm++) 

if (ww[mm]<'A' | | wwlmra]>'Z') ww[mm]= 
strcat(LinkWords[i],ww); 
fscanfl[wordsetfile, H %s",ww); 

} 

printfT LINK: %s ; LINK WORDS: %s 
\ n",LinkAddress[i] ,LinkWords [i) ); 
i++; 

if (i>=_MAX_LINKS) printfT****ERRR NUM LINKS EXCEEDED 
****\n n ); 
} 

fclose(linkFile); fclose(wordsetfile); 
NUMBER_LINKS=i; 



/** The following shows what happens at the end of the speech 
recognition process when a best sequence output path has been 
determined; The appropriate matching home page link (if any) 
is found and written out so that the two looping JAVA programs 
can automatically ipdate their Web page contexts ****/ 

/** The input hypstr is the actual best word sequence 
output word string generated by the speech recognition 
search process ***/ 

void MatchString(hypstr) 

char hypstr D; 

{ 

FILE *wordsetfile, "linkFile; 
char ww[20],dummy[700],dwll00]; 
int ij,maxindx,mm; 
float max; 

if (DEMO) 
{ 

str cpy (dummy, ); 
strcpy(dw, u "); 
while (strcmp(dw/#")!=0) 
{ 

fscanftdemofile/^s^dw); 
strcaKdummy," "); 
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strcat(dummy,dw); 

} 

strcpy(hypstr,dumniy); 

} 

/** Load link lists and match and see if anything mathches 
with a hyper link referent/command ***/ 

for (i=0;i<„MAX_LINKS;i++) 
{ 

strcpy(LinkWords[i], nn ); 
s trcpy(LinkAddress [i] , " "); 

} 

i=0; 

wordsetfile = fopenCpage.Referent'V'r"); 
linkFile = fopen("page AddressW); 
while ( (fscanf[wordsetme ) M %s" ) ww)!=EOF) && (strcmp(ww,"$")!=0 ) ) 

fscanfdinkFile/^B^LinkAddressIi] ); 
strcpy(IjnkWords[i]r); 

while (strcmp(ww/'#")!=0) 
{ 

strcat(LinkWords[iJ," "); 

for (mm=0;mm<strlen(ww)pnm++) 

if (ww[mm]<'A' | | ww[mm]>'Z') ww[mm]="; 

strcat(LinkWords[i] ,ww); 

fscanf(wordsetfile, n %s " ( ww); 

) 

printfT LINK: %s ; LINK WORDS: %s 
\ n'\LinkAddress[i] .LinkWords [i] ); 
i++; 

if (i>=_MAX_LINKS) printfT****ERRR NUM LINKS EXCEEDED 
****\n"); 
) 

fclose(linkFile); fclose(wordsetfile); 
NUMBER_LINKS=i; 
max=-99999; maxindx=-l; 
printfT ##### HYP %s #########\n",hypstr); 
for (i=0;i<NUMBER.LINKS;i++) 
{ 

f* String align the hypothesis with all the possible */ 
/* link word sequence referents */ 
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align_comp(LinkWords[i],hypstr); 
MatchCorrIi]= (float) 100.00*sent_corr/sent_ref_wds; 
printfT MATHING WITH %s (%f) 
$$$$$$\n",LinkWords[i],MatchCorr[i]); 

/** Find Best Match ****/ 
if (MatchCorr[i]> max) 
( 

max = MatchCorr[i]; 
maxindx=i; 

} 

} 

if (maxindx==-l) return; 

printfT **CLOSEST MATCH WAS %s (%s) %f 

** \ n",LinkWords [maxindx] ,IinkAddress [maxindx] ,MatchCorr [maxindx] ); 

/** If match accuracy/percent correct is greater than a threshold 
then update the appropriate page so as to communicate with 
the looping JAVA programs ****/ 

if (MatchCorr[maxindx]> THRESHOLD ) 
{ 

/** SET NEW WEB PAGE CONTEXT ****/ 
FILE *pageaddr; 

pageaddr = fopen(7u/www/users/ip-ads/sajn^ 
fprintftpageaddr/'ros'^LinkAddresstmaxindx]); 
fclose(pageaddr); 

) 



END OF APPENDIX 2 
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APPENDIX 3 SOURCE OF EXAMPLE .HTML: 

<APPLET CODE="KeepLoading.class n WIDTH=350 HEIGHT=325> 

The source of the applet KeepLoading.java is as follows: 

</APPLET> 

import java.awt.Graphics; 
import java.io.*; 
import java.net.*; 
import java.applet.*; 

public class KeepLoading extends java.applet. Applet{ 
String 

nnks'Tittp^/www.csTOchester.edxi/u/sanikkai/home.html"; 
String oldlink='"*; 
URL mypage = null; 



public void init(){ 
//stuff 

} 

public void paint(Graphics g){ 
//stuff 

) 

public void start() { 

while (true) 
{ 

URL yahoo = null; 
URL mypage = null ; 
DatalnputStream dis=null; 

// WordLabel.html is the page that the speech recognizer updates 
// depending on the speech input 
try yahoo = new 
URL("httpy/ww.cs.rochester.edu/W 
catcMException e) ; 

/******* EXCEPTION OCCURS HERE *****/ 
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try dis = new DatalnputStream(yahoo.openStreamO); 
catch(Exception e) ; 

try link=dis.readLine(); 
5 catch(Exception e); 

if (!link.equals(oldlink)) openpageO; 
sleep(l); 
oldlink=link; 
10 try dis.closeO; catch(Exception e) 

System.out.println("Unable to open Input File"); 

} 
} 



public void openpage () { 
try mypage = new URL(link); 

catch(Exception e) System.out.printlnC'Wrong URL"); 
20 AppletContext mycon = getAppletContextO; 

mycon.BhowDocumentdnypage/'NctSpeak"); 

/** System.out.println(link); **/ 

} 



END OF APPENDIX 3 
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We claim: 

1. A method for user speech actuation of access to 
information stored in a computer system, the method com- 
prising: 

storing the information in a memory of the computer 
system; 

displaying a selected subset of the information via the 
computer on a display viewable by the user; 

forming a web triggered word set from a portion of the 
information stored in the computer, including a plural- 
ity of individual words contained in the displayed 
subset; 

receiving a speech input comprising one or more spoken 
words from the user; 15 

performing a speech recognition process based on the 
speech input from the user to determine statistically a 
set of probable words based on the speech input of the 
user, including: 

processing the input speech in accordance with a set of 20 
language/acoustic model and speech recognition search 
parameters which produce probability scores for at 
least individual words; 

modifying the language/acoustic model and speech rec- 
ognition search parameters dynamically using the web 
triggered word set to boost the probability score of at 
least individual ones of the probable words; and 

updating the display to display a new subset of the 
information in accordance with the set of probable 
words determined from the speech recognition search 
as modified using the web -triggered word set. 

2. A method according to claim 1 in which the computer 
system includes a local computer operable by the user, a 
remote computer at which the information is stored and a 
network link for coupling the local and remote computers, 
the displaying step including accessing the information from 
the local computer via a web browser. 

3. A method according to claim 1 in which the information 
is stored in a hypertext markup language (HTML) source 
document and the step of forming a web triggered word set 
includes forming a set of words that includes a source word 
set including individual words extracted from the (HTML) 
source document. 

4. A method according to claim 1 in which the step of 
forming a web triggered word set includes storing in a short 
term cache one or more previously-formed web triggered 
word sets for inclusion in a current web triggered word set. 

5. A method according to claim 1 in which the step of 
forming a web triggered word set includes modifying the 
web triggered word set responsive to updating to display a 
new subset of the information. 

6. A method according to claim 1 in which the step of 
forming a web triggered word set includes forming a set of 
words that includes words selected from at least one of a 
displayed subset of the information and a basic word set of 
command and function words chosen a priori. 

7. A method according to claim 1 in which the step of 
performing a speech recognition process which includes the 
processing and modifying steps includes estimating a fit 
between an acoustic observation X of the speech input word 
sequence and a possible word sequence W according to a 
B ayes-type evaluation function 

F(S^(W|H), S AC (X|W,H)), where 
is a language mo del score, 
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S Ac is an acoustic model score, and 
H is the web triggered word set. 

8. A method according to claim 1 in which the step of 
performing a speech recognition process which includes the 
processing and modifying steps includes estimating a fit 
between an acoustic observation X of the speech input word 
sequence and a possible word sequence W according to an 
altered Bayes-like scoring function 

Pr(X|W)xPr(W) 0 ^ a( ^ 
using a special set of Omega(W,H) values for words W 
belonging to a predicted/web-triggered set H, so as to 
improve the scores of such words. 

9. A computer system for user speech actuation of access 
to stored information, the system comprising: 

a computer including a central processing unit, a memory 
and a user input/output interface including a micro- 
phone for input of user speech utterances and audible 
sound signal processing circuitry; 

means for accessing and storing information in the 
memory of the computer system; 

a speech recognition processor operating on the computer 
system for recognizing words including individual 
words based on the input speech utterances of the user 
in accordance with a set of language/acoustic model 
and speech recognition search parameters which pro- 
duce a probability score for at least the individual 
words; 

means for forming a web triggered word set from a 
selected subset of information in the document, the web 
triggered word set including a plurality of individual 
words contained in the displayed subset; 

means for modifying the language/acoustic model and 
speech recognition search parameters dynamically 
using the web triggered word set to boost the probabil- 
ity score of at least the individual words; and 

means responsive to the speech recognition processor for 
generating a word string based on the probability score 
as boosted by the modifying means for input to the 
accessing and storing means to initiate a change in the 
information accessed. 

10. A system according to claim 9 including: 

means for displaying a first portion of the selected subset 
of the information via the computer on a display 
viewable by the user, so that the user can formulate 
speech utterances based on the displayed portion of the 
information; and 

means for updating the display to show a second portion 
of the selected subset of the information in accordance 
with the word string determined from the speech rec- 
ognition search. 

11. A system according to claim 9 in which the informa- 
tion stored in the memory comprises a first document and the 
accessing and storing means includes means for directing 
the system to access a different document responsive to the 
word string. 

12. A method according to claim 9 in which the computer 
system includes a local computer operable by the user, a 
remote computer at which the information is stored and a 
network link for coupling the local and remote computers, 
the displaying means including means for accessing the 
information from the local computer via a web browser. 
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